Matching

In this part of the assignment, we will implement regular-expression matching, using derivatives. You will do your work in the file src/Evaluation.hs.

To Do

  1. By referring to the equations from the course materials or to the Owens et al. paper, complete the definition of deriv.

  2. Get preliminary confirmation that your code works by running the tests in test/MatchSpec.hs (which tries regexMatch on a collection of predefined regular expressions and strings, and reports the results). You can run the tests with the command:

    stack test --test-arguments "--match /Match"
  3. Consider adding tests to test/MatchSpec.hs that cover the remaining forms of Regex, making sure that you are testing all the regular-expression operators with both matches and mismatches. You are also welcome to add more regular-expression definitions to the file test/ExampleRegexes.hs, in addition to the examples there already.

Hints / Comments

  • The function map may be useful.

  • The regular expression Concat [r1, r2, ..., rn] is equivalent to Concat [r1, Concat[r2, ..., rn]]; and Concat [] is equivalent to Epsilon.

  • In any DFA that accepts language L, running a string w through the DFA ends in a state whose language is \(∂_w(L)\).

  • The Owens et al. paper writes the derivative function for concatenation using a helper function ν that either returns ε or . For implementation purposes, I recommend using the equivalent but more explicit formula in the course materials, i.e., \(∂_a (r s) = ((∂_a r) s)\) when r is not nullable, and \(∂_a (r s) = ( (∂_a r) s) | (∂_a s) )\) when r is nullable.

  • If you run stack gchi you can always try out some expressions for yourself, e.g.,

    deriv 'a' (Concat [Letter 'a', Letter 'b'])
  • When you write new tests, be very careful that you write down the correct answer! Negation in particular can be tricky: at first glance you might think the regular expression (¬(ab))* would not match ab or abab, but in fact that pattern matches every string! (Every string can be divided into a finite number of pieces that are not ab, e.g., by dividing the string into its individual letters.)